Multilingual Computational Semantic Lexicons in Action: The WYSINNWYG Approach to NLP
نویسنده
چکیده
Much effort has been put into computational lexicons over the years, and most systems give much room to (lexical) semantic data. However, in these systems, the effort put on the study and representation of lexical items to express the underlying continuum existing in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has remained embryonic. A sense enumeration approach fails from a theoretical point of view to capture the core meaning of words, let alone relate word meanings to one another, and complicates the task of NLP by multiplying ambiguities in analysis and choices in generation. In this paper, I study computational semantic lexicon representation from a multilingual point of view, reconciling different approaches to lexicon representation: i) vagueness for lexemes which have a more or less finer grained semantics with respect to other languages; ii) underspecification for lexemes which have multiple related facets; and, iii) lexical rules to relate systematic polysemy to systematic ambiguity. I build on a What You See Is Not Necessarily What You Get (WYSINNWYG) approach to provide the NLP system with the "right" lexical data already tuned towards a particular task. In order to do so, I argue for a lexical semantic approach to lexicon representation. I exemplify my study through a cross-linguistic investigation on spatially-based expressions. 1 A Cross-linguistic Investigation on Spatially-based Expressions In this paper, I argue for computational semantic lexicons as active knowledge sources in order to provide Natural Language Processing (NLP) systems with the "right" lexical semantic representation to accomplish a particular task. In other words, lexicon entries are "pre-digested', via a lexical processor, to best fit an NLP task. This What You See (in your lexicon) Is Not Necessarily What You Get (as input to your program) (WYSIN-NWYG) approach requires the adoption of a symbolic paradigm. Formally, I use a combination of three different approaches to lexicon representations: (1) lexico-semantic vagueness, for lexemes which have a more or less finer grained semantics with respect to other languages (for instance en in Spanish is vague between the Contact and Container senses of the Location, whereas in English it is finer grained, with on for the former and in for the latter); (2) lexico-semantic underspecification, for lex-emes which have multiple related facets (such as for instance, door which is underspecified with respect to its Aperture or PhysicalObject meanings); and, (3) lexical rules, to relate systematic polysemy to systematic ambiguity (such …
منابع مشابه
Multilingual Computational Semantic Lexicons in Action: Wysinnwyg Approach to Nlp 1 a Cross-linguistic Investigation on Spatially-based Expressions
The Abstract Much effort has been put into computational lexicons over tile years, and most systems give much room to (lexical) semantic data. However, in these systems, the effort put on tile study and representation of lexical items to express the umterlying continuum existing in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has remained embryonic. A sense enumerati...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملDevelopment of the Multilingual Semantic Annotation System
This paper reports on our research to generate multilingual semantic lexical resources and develop multilingual semantic annotation software, which assigns each word in running text to a semantic category based on a lexical semantic classification scheme. Such tools have an important role in developing intelligent multilingual NLP, text mining and ICT systems. In this work, we aim to extend an ...
متن کاملMultilingual lexicons for related languages
The great increase in work on the lexicon by computational and theoretical linguists throughout the s has concerned itself almost exclusively with monolingual lexicons Meanwhile applied work on multilingual lexicons mostly for machine translation MT has employed monolingual lexicons linked only at the level of semantics In this paper we argue that the traditional MT lexicon architecture while a...
متن کاملNLP lexicons: innovative constructions and usages for machines and humans
Lexical resources have undergone significant changes with the generalized use of computers and the advent of the Internet. However, while such changes stand for revolutions when it comes to compare machine-readable dictionaries to their paper 'ancestors', machine-readable dictionaries, compiled for human readers, still have serious limitations. Natural language processing lexicons, initially de...
متن کامل